A comparison of grapheme and phoneme-based units for Spanish spoken term detection

نویسندگان

  • Javier Tejedor
  • Dong Wang
  • Joe Frankel
  • Simon King
  • José Colás Pasamontes
چکیده

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-ofvocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain ALBAYZIN corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar. 2008 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grapheme based speech recognition

Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering procedures are performed and compared to each other. Grapheme based speech recognizers in three ...

متن کامل

Graphemes as basic units for crosslingual speech recognition

This paper presents our work on grapheme based crosslingual speech recognition carried out within the MASPER initiative. The performance of monolingual grapheme based acoustic models is compared to the performance of monolingual acoustic models based on phonemes. The transfer between source and target language was done using an expert knowledge approach. For the experiments, German, Spanish, Hu...

متن کامل

Spanish dialects: phonetic transcription

It is well known that canonical Spanish, the dialectal variant ‘central’ of Spain, so called Castilian, can be transcribed by rules. This paper deals with the automatic grapheme to phoneme transcription rules in several Spanish dialects from Latin America. Spanish is a language spoken by more than 300 million people, has an important geographical dispersion compared among other languages and ha...

متن کامل

Hybrid Grapheme to Phoneme Conversion forUnlimited

Both dictionary-based and rule-based methods on grapheme-to-phoneme conversion have their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method and the LTS(letter to sound) rule-based method itself cannot model the complete morphophonemic constraints. This paper describes a grapheme-to-phoneme...

متن کامل

Improving grapheme-based ASR by probabilistic lexical modeling approach

There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usual...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2008